Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals
Delft University of Technology
May 13, 2024
\[ \begin{aligned} \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)} + \lambda {\text{cost}(f(\mathbf{Z}^\prime)) } \} \end{aligned} \]
Counterfactual Explanations (CE) explain how inputs into a model need to change for it to produce different outputs.
All of these counterfactuals are valid explanations for the model’s prediction.
Which one would you pick?
Figure 2: Turning a 9 into a 7: Counterfactual explanations for an image classifier produced using Wachter (Wachter, Mittelstadt, and Russell 2017), Schut (Schut et al. 2021) and REVISE (Joshi et al. 2019).
We propose ECCCo: a new way to generate faithful model explanations that are as plausible as the underlying model permits.
Definition 1 (Plausible Counterfactuals) Let \(\mathcal{X}|\mathbf{y}^+= p(\mathbf{x}|\mathbf{y}^+)\) denote the true conditional distribution of samples in the target class \(\mathbf{y}^+\). Then for \(\mathbf{x}^{\prime}\) to be considered a plausible counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}|\mathbf{y}^+\).
Why Plausibility?
Plausibility is positively associated with actionability, robustness (Artelt et al. 2021) and causal validity (Mahajan, Tan, and Sharma 2020).
Definition 2 (Faithful Counterfactuals) Let \(\mathcal{X}_{\theta}|\mathbf{y}^+ = p_{\theta}(\mathbf{x}|\mathbf{y}^+)\) denote the conditional distribution of \(\mathbf{x}\) in the target class \(\mathbf{y}^+\), where \(\theta\) denotes the parameters of model \(M_{\theta}\). Then for \(\mathbf{x}^{\prime}\) to be considered a faithful counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}_{\theta}|\mathbf{y}^+\).
Trustworthy Models
If the model posterior approximates the true posterior (\(p_{\theta}(\mathbf{x}|\mathbf{y}^+) \rightarrow p(\mathbf{x}|\mathbf{y}^+)\)), faithful counterfactuals are also plausible.
Key Idea
Use the hybrid objective of joint energy models (JEM) and a model-agnostic penalty for predictive uncertainty: Energy-Constrained (\(\mathcal{E}_{\theta}\)) Conformal (\(\Omega\)) Counterfactuals (ECCCo).
ECCCo objective1:
\[ \begin{aligned} & \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {L_{\text{clf}}(f(\mathbf{Z}^\prime);M_{\theta},\mathbf{y}^+)}+ \lambda_1 {\text{cost}(f(\mathbf{Z}^\prime)) } \\ &+ \lambda_2 \mathcal{E}_{\theta}(f(\mathbf{Z}^\prime)|\mathbf{y}^+) + \lambda_3 \Omega(C_{\theta}(f(\mathbf{Z}^\prime);\alpha)) \} \end{aligned} \]
The code used to run the analysis for this work is built on top of CounterfactualExplanations.jl, part of Taija.
📜 Explaining Black-Box Models through Counterfactuals in JuliaCon Proceedings.
With thanks to my co-authors Mojtaba Farmanbar, Arie van Deursen and Cynthia C. S. Liem.